Identification and validation of fusidic acid and flufenamic acid as inhibitors of SARS-CoV-2 replication using DrugSolver CavitomiX

In this work, we present DrugSolver CavitomiX, a novel computational pipeline for drug repurposing and identifying ligands and inhibitors of target enzymes. The pipeline is based on cavity point clouds representing physico-chemical properties of the cavity induced solely by the protein. To test the pipeline’s ability to identify inhibitors, we chose enzymes essential for SARS-CoV-2 replication as a test system. The active-site cavities of the viral enzymes main protease (Mpro) and papain-like protease (Plpro), as well as of the human transmembrane serine protease 2 (TMPRSS2), were selected as target cavities. Using active-site point-cloud comparisons, it was possible to identify two compounds—flufenamic acid and fusidic acid—which show strong inhibition of viral replication. The complexes from which fusidic acid and flufenamic acid were derived would not have been identified using classical sequence- and structure-based methods as they show very little structural (TM-score: 0.1 and 0.09, respectively) and very low sequence (~ 5%) identity to Mpro and TMPRSS2, respectively. Furthermore, a cavity-based off-target screening was performed using acetylcholinesterase (AChE) as an example. Using cavity comparisons, the human carboxylesterase was successfully identified, which is a described off-target for AChE inhibitors.

www.nature.com/scientificreports/ the polyprotein and makes Pl pro a promising drug target. The predicted inhibitors were tested for their ability to inhibit the target enzymes in wet-lab experiments using a Caco-2 cell-based viral-infection assay.
In the second validation experiment, we tested the ability of the Catalophore™ based DrugSolver CavitomiX platform to identify off-targets. For this application, the example of acetylcholinesterase (AChE) was chosen. AChE is an essential enzyme in the central nervous system of animals and the target for drugs against Alzheimer's disease (AD) 28 as well as the target of the highly potent organophosphorus nerve agents, like Sarin or Novichok 29 . AChE terminates neurotransmission at cholinergic synapses by hydrolyzing acetylcholine (ACh), consequently regulating the neurotransmitter's concentration 30 . In AD treatment, that process is slowed down through inhibitors to enhance the low concentrations of ACh in the synaptic cleft. The Novichok agents, on the other hand, covalently bind to the catalytic triad of AChE and thus permanently inactivates AChE, leading to lethal levels of ACh accumulation 29 . The identification of off-targets is highly interesting since it helps to predict and potentially avoid possible side effects of AD drugs or harmful effects of nerve agents. Off-targets are identified by the similarity of their binding-site cavities to the active-site cavity of AChE. To this end, a search was performed in an uncurated dataset of cavities of PDB-proteins (i.e. all cavities from PDB proteins, calculated with standard parameters, without any manual selection or curation).

Results
Calculated active-site cavities and cavity representatives for different dynamic states. First all detectable cavities of the main protease, TMPRSS2, and papain-like protease were calculated; the Catalo-phore™ technology 5,6 makes it possible to represent cavities as 3D-point clouds and attribute a set of physicochemical properties to each cavity point. After that, the cavity aligning the active-site residues was selected as the active-site cavity. (as shown in Fig. 2). Then, MD simulations were carried out: 10 ns (TMPRSS2, PI Pro ) and 100 µs (M pro ). 1000 snapshot structures per protein were created along the trajectory and the active-site cavities were calculated in all these structures. Using hierarchical clustering methods all the resulting active-site cavities were grouped into clusters based on point cloud similarities. These clusters can be interpreted to represent different states of the cavities created by protein dynamics. For each larger cluster, a cavity representative was calculated and used in an additional set of cavity comparisons. This way, protein dynamics and their influence on protein binding pockets can be considered.
Cavity-based identification of potential inhibitors. As mentioned above, the following strategy was used to identify potential inhibitors for M pro , Pl pro , and TMPRSS2: similar cavities to the calculated active-site cavity representatives with bound ligands were identified. Enzymes with a similar active site should show similar small-molecule-binding properties. The calculated cavity representatives were compared with a set of cavities from the PDB with the condition that the cavity must contain a co-crystallized ligand. This way, it was possible to gather a dataset of approximately 100,000 cavities with a known binding partner to which the cavity cluster representatives could be compared. The ligands of the most similar cavities to the target cavities were selected for testing their inhibitory effect on SARS-CoV-2 replication.  www.nature.com/scientificreports/ Potential inhibitors identified for TMPRSS2 and the SARS-CoV-2 protease. The following compounds (shown in Fig. 3) were identified as potential inhibitors for the SARS-CoV-2 M pro , SARS-CoV-2 Pl pro , and the human TMPRSS2 by the high similarity of the compound-binding cavities to the active-site cavities of M pro /Pl pro /TMPRSS2.

Cell-based virus replication assay test to determine the inhibitory effects of selected compounds on the replication of SARS-CoV-2. First compounds targeted against the M pro were tested on
Vero CCL81 cells at the highest concentration where no cytotoxic effect were observed (Fig. S1). None of the tested substances showed a strong antiviral effect, except chloroquine, which was used as a control substance as it is known to inhibit viral replication in Vero cells 31 despite not showing therapeutic benefit in clinical trials 33 . Only low antiviral activity was found for resveratrol. This substance is known to bind unspecifically to many proteins and influences a broad variety of cellular factors. Binding to the M pro cannot be excluded, but many other factors may also be influenced by resveratrol and result in the low antiviral effect observed 34 . In parallel, a cell-based virus replication assay suitable to test TMPRSS2 inhibitors was established. A Caco-2 cell-based assay was chosen since Caco-2 cells were shown to express TMPRSS2 (Fig. S4). This assay was used to test potential inhibitors for TMPRSS2 and also for testing potential inhibitors of M pro and Pl pro , which were not tested in the first round using the Vero cell-based assay. Caco-2 cells were infected at a high viral load (MOI: 0.825) to identify a strong antiviral effect of the compounds and to exclude possible false-positive results. Samples were taken after 48 h and analyzed with qRT-PCR for the quantification of viral replication 35,36 . Two compounds, fusidic acid and flufenamic acid, show a antiviral effect comparable with the known TMPRSS2 inhibitor Camostat Mesylate as shown in Fig. 4. Additionally, the antiviral effect was confirmed using immunohistochemistry staining of the infected cells. In this semiquantitative assay, infected cells were visualized utilizing a SARS-CoV-2 antibody directed against the nucleocapsid protein (N). As shown in Fig. S3, fewer cells got infected compared to an infected control when treated with fusidic acid or flufenamic acid during the assay and incubation period.

Fusidic acid and flufenamic acid were identified as inhibitors of viral replication. Fusidic
acid. Active-site cavity representatives of TMPRSS2 show a high similarity to the ligand-binding cavity of the Escherichia coli enzyme chloramphenicol acetyltransferase 3 (PDB-ID: 1QCA). The corresponding cavity match can be seen in Fig. 5a. Chloramphenicol acetyltransferase and TMPRSS2 show no significant sequence identity (~ 5%) or structural similarity (Tm score: 0.09). Fusidic acid is a potent competitive inhibitor of chloramphenicol acetyltransferase 3 37 and is an approved drug that has been used as an antibiotic since the 1960s 38 , making it an attractive candidate as a potential inhibitor of TMPRSS2.
Flufenamic acid. The active-site cavity representatives of the M pro show a high similarity to a cavity of transcription factor TEAD2 with bound flufenamic acid (Fig. 5b). The main protease and the transcription factor TEAD2 are sequentially and structurally diverse and therefore show a very low sequence identity of about 4% and a very low Tm score of 0.1. Flufenamic acid was invented and approved in the 1960s 39 and is mainly used as an analgesic drug in treating the pain associated with rheumatoid diseases 40 .  Fig. 6. CES1 is known to bind AChE inhibitors, making it an off-target for drugs targeting AChE. The fact that it can be identified using the Catalophore™ DrugSolver platform shows that also off-target identification is possible, despite the actual geometric shape of the cavities, which were identified to have similar properties, differ significantly.

Discussion
In this work, we established a workflow based on active site cavity comparisons, capable of identifying inhibitors for selected target enzymes. This technology enables a novel approach towards drug repurposing since it is independent of structure and sequence alignments. DrugSolver CavitomiX delivered two already approved drugs that show anti-viral activity against SARS-CoV-2 in infection assays: fusidic and flufenamic acid. These two compounds were found as proteins binding them had similar cavities as our target enzymes. Despite their high active-site cavity similarity, they show very little structural (TM-score 41 of 0.09 and 0.1) and very low sequence (around 5%) identity to M pro and TMPRSS2, respectively. Thus, these two proteins (and consequently their ligands) would never have been identified using sequence-and structure-based comparisons. Now that the computational pipeline is established, it can quickly be modified and adapted to address new pathogens as needed. In future versions, a combination with technologies such as DrugSolver pathogen-seqscan 42 , which enables the automatic detection of druggable targets in new pathogens and is part of Innophore's Drug-Solver platform, makes it possible to address newly occurring pathogens even faster. Also, in the case of mutating drug targets, like SARS-CoV-2 M pro43 , the mutated targets can quickly be re-addressed if needed.
In the last years, more and more efforts have been devoted to the development of novel methods of protein structure prediction. Methods like AlphaFold2 2 , RoseTTAFold 4 , ESM fold 3 , and the BioNeMo toolbox (https:// www. nvidia. com/ en-us/ gpu-cloud/ bione mo/) were made available to the public, and follow-up versions of these methods are currently in development. These new methods, among other drug repurposing approaches 44 , make DrugSolver CavitomiX even more potent. It enables its users to quickly predict the proteome of newly emerging pathogens. Using the Catalophore™ CavitomiX approach, the cavitome of new pathogens can be calculated, and the here-described workflow enables the rapid identification of potential inhibitors.
CES1 is already described to be a target of AChE inhibitors. The enzyme was even suggested to be applied as a bio-scavenger to protect against organophosphorus nerve agents, which are known to bind to AChE 45 . Finding the CES1 among the best hits in the Catalophore™ workflow is a promising result, showing the potential of the Catalophore™ DrugSolver CavitomiX to be used for off-target identification. A combination of drug identification www.nature.com/scientificreports/ and off-target identification in one workflow has great potential. Identifying off-targets is of great interest for predicting and preventing side effects. The given example here serves as a proof-of-principle, and the off-target identification using 3D cavity property point cloud information will be further improved in future work.

Methods
Homology model. For TMPRSS2 and Pl pro , SWISS-models 46-49 were built using default settings. While Pl pro had an 83% sequence identity with its crystal template (PDB-ID: 5y3e), the best template for TMPRSS2 revealed only 38% identity with serine protease Hepsin (PDB-ID: 5ce1) automatically selected as a template. For the main protease, we used the model as well as MD-simulation data of the D.E. Shaw research center, which had been published in March 2020 50 .

Molecular dynamics simulations.
Own MD simulations have been carried out using Yasara version 20.4.24 51 with the Amber03 force field. For each system, 1000 structural snapshots with a constant time interval of 10 ps in between had been sampled, covering a total time range of 10 ns. However, regarding M pro MD simulations of 100 µs length, run on Anton supercomputers at D.E. Shaw research, the 1000 extracted snapshots had 100 ns intervals in between. Snapshots of all systems were used for the next workflow steps.

Calculation of cavity point clouds.
For the calculation of the cavity point clouds, the Catalophore™ technology 5,6 is used. The workflow scheme to calculate protein cavities is shown in Fig. 7. The cavity calculation is based on a protein structure. This can be an experimental structure, a homology model, or a structure derived from an MD simulation. The first step is the preparation step: here, hydrogen atoms and potentially missing residues are added, and alternates are handled. For cavity detection, the ligsite algorithm 52 is used. The algorithm searches for protein-non-protein-protein events on each grid point in a line on the axes. The analysis of the grid points helps identify closed cavities and points that are accessible to the solvent. www.nature.com/scientificreports/ Grid settings are important to find the same cavities with different orientations in space, and the defined axis directions are usually sufficient to determine cavities appropriately. Additionally, a softness approach is employed to compensate for slight variations in grid settings and eliminate small surface cavities without altering the grid spacing or increasing computation time. After detection, a set of 19 properties using AutoGrid4 53 is assigned to the cavity points. These properties are derived from the three-dimensional arrangement of the amino acids and special surroundings like cofactors and or ligands within the 3D structure. For each physio-chemical property, one property point cloud is calculated. The result is a collection of pooled cavity grid points, each assigned with physico-chemical properties and additional derived values like hydrophobicity and accessibility values. These points can be compared and aligned with cavity point clouds from other proteins based on their individual physico-chemical properties.
Comparison of two cavities. For the comparison of two cavities, the point clouds from the query and select point clouds from a pre-computed database are matched. More precisely, the point clouds are superimposed and the superposition optimized. To achieve this, differences are evaluated by comparing the closest grid points between the target and the template.
For each property point-cloud pair, a "property score" is generated, representing the quality of the match. A set of weights allows for fine-tuning the influence of the different grid maps (for instance, setting the scoring factor for the H-bond point clouds to zero causes this interaction to be ignored). This scoring is combined with a geometric match into a "total score" and an iterative-closest-point algorithm is performed to optimize it and determine its final value. www.nature.com/scientificreports/ Clustering and identification of cavity-cluster representatives. As described above, we ran MD simulations in order to encode the dynamics of the entire protein and, in particular, the dynamics of the activesite cavity. During each simulation, we created 1000 snapshots of each corresponding MD trajectory. The activesite point clouds were calculated from each snapshot like they would be from the origin protein structure. This procedure leads to a set of 1000 point clouds representing the active-site-cavity snapshots for each protein. In order to identify different representatives of the dynamic states of the active-site cavities, we compared the point clouds from all snapshots via an all-to-all match. The results from such a matrix match are visualized in the heatmaps in Fig. 8. The identification of dynamical representatives was carried out via hierarchical clustering based on the total-score difference matrix. In fact, the heatmaps in Fig. 8 are already ordered based on the clustering results.
As for choosing the representatives, we limited ourselves to the largest clusters, in particular those that contained more than 5% of the total number of point clouds. This emphasizes those states that are present for large enough amounts of time during the simulation in order to warrant further study. The clusters are illustrated in the side elements next to the heatmaps in Fig. 8, where snapshot location in MD-simulation time is plotted versus the clustering, each cluster is colored differently, and each representative is marked by a diamond (irrespective of cluster size).
A cluster representative was calculated for each large cluster via a minimal-distance requirement to all other cluster members. These representatives were used further in the following steps of the workflow.

Selection of potential inhibitors.
Following the comparison of the cavity datasets, a selection process was undertaken to identify the most promising potential inhibitors. Each cavity in the datasets is assigned a similarity score, reflecting its resemblance to the query cavity. These similarity scores were then utilized to rank the cavities, enabling the identification of the most closely aligned cavities to the query cavity. Additionally, to ensure that only significant results for potential inhibitor binding were considered, cloud-overlap cut-offs (cloud overlap 1 and 2) were considered. These cut-offs, set at 75 and 50 percent, quantified the extent of overlap between two-point clouds during comparison. By applying these criteria, only the most relevant results were retained. Furthermore, cavities containing bound small molecules present in the crystallization conditions were discarded from further analysis. For example, small molecules like polyethylene glycol were considered irrelevant to the investigation and were therefore excluded. Following these steps, the ligands associated with the most similar cavities were investigated and carefully selected for further examination. The availability and accessibility of small molecules were taken into account, ensuring that only those small molecules that could be readily purchased for further experimentation were considered.

Cytotoxicity assay.
To determine the cytotoxic effect of the substances used, Caco2 cells were seeded in 96 well plates at a density of 5000 cells/well in Gibco™ Minimum Essential Medium (MEM with 10% FCS, 1% Pen-Strep) and incubated for 48 h at 37 °C and 5% CO 2 . The medium was then removed, and the respective substance, diluted first in EtOH, then in the respective medium (EtOH concentration was 1% in all conditions), was     Flow cytometry. Caco2 cells were harvested by incubation in 1 × accutase for 10 min, counted, and centrifuged at 400×g for 5 min at room temperature before washing and resuspension in FACS buffer (PBS supplemented with 0.05% sodium azide and 0.5% bovine serum albumin). A total of 1 × 10 6 cells was incubated in FACS buffer containing the primary antibodies for ACE2 (LSBio, LS-C344721, mAb, mah, 1:500) and TMPRSS2 (abcam, ab280567, mAb, rah,1:200) or IgG1 (Agilent Technologies, X0931, 1:1000) in case of the control for 30 min at 4 °C. The cells were then washed in 2 mL of FACS buffer, centrifuged at 400×g for 5 min, and incubated in FACS buffer containing the secondary antibodies Alexa Fluor 647 goat-anti-mouse IgG (Invitrogen, A21235, 1:700) and Alexa Fluor 488 goat-anti-rabbit (Invitrogen, A11034, 1:700) in the dark for 20 min at 4 °C. After washing and centrifugation, the cells were resuspended in 1.5 mL FACS buffer before adding 0.5 mL 4% PFA for 10 min for fixation. After another washing step, the cells were resuspended in 400 µL FACS buffer, transferred to FACS tubes, and kept on ice in the dark until further use. Flow cytometry was performed using a CytoFLEX S flow cytometer equipped with the CytExpert Software (both Beckman Coulter). A typical cell area was gated, and 10,000 single events per sample were acquired. Paraphrasing was performed on some sentences using the ChatGPT language model (OpenAI). The original text was inputted into the model, and the resulting paraphrased text was compared to the original to ensure accuracy and preserve the intended meaning. Any necessary revisions were made by the authors.

Data availability
The CavitOmiX plugin for Schrodinger's PyMOL, a tool that allows analyzing protein cavities from any input structure, is freely available for download at https:// innop hore. com/ cavit omix/ and https:// pymol wiki. org/ index. php/ Cavit OmiX. It enables the analysis of protein structures, Catalophore cavities, and binding sites using crystal structures and AI models from OpenFold (powered by NVIDIA's BioNeMo service), DeepMind`s AlphaFold, and ESMFold by Meta. If required, additional supplementary data can be obtained from the authors on request.